Custom floating - point arithmetic for integer processors : algorithms , implementation , and selection
نویسندگان
چکیده
Media processing applications typically involve numerical blocks that exhibit regular floating-point computation patterns. For processors whose architecture supports only integer arithmetic, these patterns can be profitably turned into custom operators, coming in addition to the five basic ones (+, −, ×, / and √ ), but achieving better performance by treating more operations. This thesis addresses the design of such custom operators as well as the techniques developed in the compiler to select them in application codes. We have designed optimized implementations for a set of custom operators which includes squaring, scaling, adding two nonnegative terms, fused multiply-add, fused square-add (x2 + z with z > 0), two-dimensional dot products (DP2), sums of two squares, as well as simultaneous addition/subtraction and sine/cosine. With novel algorithms targeting high instruction-level parallelism and detailed here for squaring, scaling, DP2, and sin/cos, we achieve speedups of up to 4.2x for individual custom operators even when subnormal numbers are fully supported. Furthermore, we introduce the optimizations developed in the ST231 C/C++ compiler for selecting such operators. Most of the selections are achieved at high level, using syntactic criteria. However, for fused square-add, we also enhance the framework of integer range analysis to support floating-point variables in order to prove the required positivity condition z > 0. Finally, we provide quantitative evidence of the benefits to support this selection of custom operations: on DSP kernels and benchmarks, our approach allows us to be up to 1.59x faster compared to the sole usage of basic ones.
منابع مشابه
Implementation of binary floating-point arithmetic on embedded integer processors - Polynomial evaluation-based algorithms and certified code generation
Today some embedded systems still do not integrate their own floating-point unit, for area, cost, or energy consumptionconstraints. However, this kind of architectures is widely used in application domains highly demanding on floating-point calculations (multimedia, audio and video, or telecommunications). To compensate this lack of floating-pointhardware, floating-point arithmetic ...
متن کاملImplementation of Custom Precision Floating Point Arithmetic on FPGAs
F loating point arithmetic is a common requirement in signal processing, image processing and real time data acquisition & processing algorithms. Implementation of such algorithms on FPGA requires an efficient implementation of floating point arithmetic core as an initial process. We have presented an empirical result of the implementation of custom-precision floating point numbers on an FPGA p...
متن کاملAn Efficient Implementation of Floating Point Multiplier using Verilog
To represent very large or small values, large range is required as the integer representation is no longer appropriate. These values can be represented using the IEEE754 standard based floating point representation. Floating point multiplication is a most widely used operation in DSP/Math processors, robots, air traffic controller, digital computers. Because of its vast areas of application, t...
متن کاملApproach based on instruction selection for fast and certified code generation
Nowadays floating-point arithmetic [1] has become ubiquitous in the specification and implementation of programs, including those targeted at embedded systems. However, for the sake of chip area or power consumption constraints, some of these embedded systems are still shipped with no floatingpoint unit. In this case, only integer arithmetic is available at the hardware level. Hence, in order t...
متن کاملAccelerating Integer Neural Networks On Low Cost DSPs
In this paper, low end Digital Signal Processors (DSPs) are applied to accelerate integer neural networks. The use of DSPs to accelerate neural networks has been a topic of study for some time, and has demonstrated significant performance improvements. Recently, work has been done on integer only neural networks, which greatly reduces hardware requirements, and thus allows for cheaper hardware ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013